Skip to content

add nn/nb words, from Språkbanken freq-lists; remove some overlaps#19

Open
unhammer wants to merge 1 commit intonschum:masterfrom
unhammer:master
Open

add nn/nb words, from Språkbanken freq-lists; remove some overlaps#19
unhammer wants to merge 1 commit intonschum:masterfrom
unhammer:master

Conversation

@unhammer
Copy link
Copy Markdown

New words filtered by Danish/Swedish/other-Norwegian, e.g.

sudo apt install apertium-{dan,swe,nno,nob} # Using http://apertium.projectjj.com/apt/nightly

only_unknown () {
    grep '/\*'
}
only_wordform () {
    sed 's,/\*.*,,; s,/[0-9]*<.*\$,,; s/[][$^]//g'
}

wget http://www.nb.no/sbfil/tekst/1gram_nno_f1_freq.zip
unzip 1gram_nno_f1_freq.zip

<1gram_nno_f1_f.frk iconv -f iso-8859-1 -t utf-8 \
  | apertium-destxt \
  | lt-proc /usr/share/apertium/apertium-dan/dan.automorf.bin \
  | only_unknown \
  | only_wordform \
  | apertium-destxt \
  | lt-proc /usr/share/apertium/apertium-nob/nob.automorf.bin \
  | only_unknown \
  | only_wordform \
  | apertium-destxt \
  | lt-proc /usr/share/apertium/apertium-swe/swe.automorf.bin \
  | only_unknown \
  | only_wordform \
  | head -100  \
  | sed 's/ *[0-9]*  *//' \
  | sed 's/.*/"&"/'

New words filtered by Danish/Swedish/other-Norwegian, e.g.

```
sudo apt install apertium-{dan,swe,nno,nob} # Using http://apertium.projectjj.com/apt/nightly

only_unknown () {
    grep '/\*'
}
only_wordform () {
    sed 's,/\*.*,,; s,/[0-9]*<.*\$,,; s/[][$^]//g'
}

wget http://www.nb.no/sbfil/tekst/1gram_nno_f1_freq.zip
unzip 1gram_nno_f1_freq.zip

<1gram_nno_f1_f.frk iconv -f iso-8859-1 -t utf-8 \
  | apertium-destxt \
  | lt-proc /usr/share/apertium/apertium-dan/dan.automorf.bin \
  | only_unknown \
  | only_wordform \
  | apertium-destxt \
  | lt-proc /usr/share/apertium/apertium-nob/nob.automorf.bin \
  | only_unknown \
  | only_wordform \
  | apertium-destxt \
  | lt-proc /usr/share/apertium/apertium-swe/swe.automorf.bin \
  | only_unknown \
  | only_wordform \
  | head -100  \
  | sed 's/ *[0-9]*  *//' \
  | sed 's/.*/"&"/'

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant